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1.  INTRODUCTION  and  ABSTRACT 

Since  the  work  of  Robbins  and  Monro  [1]  and  Kiefer  and  Wolfowit:  (2],  much 
attention  has  been  devoted  to  the  problems  of  stochastic  approximation  [ 3 , 4 J . 

In  recent  years,  work  on  the  recursive  estimation  algorithms  in  adaptive  con¬ 
trol  and  communication  systems  has  both  rekindled  widespread  interest  and 
required  results  under  rather  different  assumptions  on  the  noise  and  dynamics 
than  were  used  in  the  early  years  (see,  e.g.,  [5-10]).  Problems  with  constraints 
have  been  treated  by  similar  methods. 

A  typical  form  of  current  interest  is  the  following:  The  iterate  sequence 
is  given  by 

(1.1)  X  .  =  X  +  a  h(X  )  ,  X  e  Rr  ,  Euclidean  r-space, 

v  1  n+1  n  n  n  nJ  n  1 

=  *  ,  0  <  a^  -*■  0  as  n  -*•  »,  and  h(*,»)  is  not  necessarily  continuous. 

For  example,  it  might  be  an  indicator  function.  Also,  the  noise  sequence  (Cn) 
might  depend  on  (X^l  in  a  complicated  way.  The  references  [5]  -  [10]  contain 
a  variety  of  techniques  which  are  useful  for  proving  w.p.l  or  weak  convergence 
for  fairly  general  types  of  stochastic  approximations,  both  with  and  without 
constraints.  But  they  are  not  good  enough  to  treat  many  problems  where  h(*,0 
is  discontinuous  or  {£n>  'state  dependent.' 

In  [11]  and  [12],  averaging  methods  were  used  to  get  weak  convergence 
results  for  suitably  scaled  stochastic  difference  equations,  where  the  'dynam¬ 
ical'  term  corresponding  to  our  h(*,*)  might  have  the  properties  mentioned 
above.  Here  we  adapt  the  method  of  [11]  and  [12],  with  that  of  [6]  to  develop 
a  technique  which  is  quite  useful  and  versatile  for  the  problems  of  interest. 

In  a  sense,  we  rely  on  the  assumption  that  even  if  h(*,*)  is  not  smooth, 
expectations  or  conditional  expectations  of  the  types  Eh(*,£n)  or 

E[h(*,5  )|E  , .  C  ,,...]  are  smooth  functions  of  x.  This  situation  occurs 

n  n- 1 


in  many  cases,  as  attested  to  by  the  examples  in  Section  5.  Results  for  both 
w.p.l  and  weak  convergence  (see  remark  at  end  of  Section  4)  are  available. 


We  also  obtain  analogous  results  for  the  following  projection  algorithm. 
Let  qjt*)*  •••>  qm(*)  denote  continuously  differentiable  functions  and  define 
G  =  {x:  q^(x)  ?  0,  i  =  1,  ....  m).  Let  uG(y)  denote  any  closest  point  in  G 
to  y.  Then  the  algorithm  is  defined  by 

(1.2)  *„.!  ■  "C0<„  ♦  V<*n.5„>)  • 

In  Section  2,  we  treat  the  algorithm  (1.1),  and  the  projected  algorithm  is 
treated  in  Section  3.  A  method  for  'state  dependent'  noise  {£^3  is  given  in 
Section  4,  and  Section  5  contains  two  non-standard  examples. 


2.  w.p.l  CONVERGENCE  FOR  (1.1) 

*  2 

Assumptions.  Write  for  Xn+1  -  X^,  and  let  denote  the  space  of 

real  valued  continuous  functions  on  R  with  compact  support  and  continuous 
second  partial  derivatives.  Let  En  denote  expectation  conditioned  on 

j  <n),  and  K  will  be  used  to  denote  a  constant  (its  value  might  change 
from  usage  to  usage) . 

One  of  the  key  difficulties  is  proving  w.p.l  boundedness  of  {X^}.  For 
this  we  use  a  stability  assumption  on  the  differential  equation  x  =  h(x), 
where  h(x)  is  ( very  loosely  speaking )  Eh(x,£j).  The  boundedness  argument 
uses  a  perturbed  form  of  the  Liapunov  function  V(*)  for  that  differential 
equation,  and  various  differences  or  derivatives  of  V(*)  appear  in  the 
(mixing  type)  conditions.  Owing  to  this,  some  of  the  conditions  might  seem  at 
first  glance  a  little  unnatural,  but  they  in  fact  are  frequently  readily 
verifiable.  Theorem  1  and  its  conditions  should  be  viewed  as  a  prototype  of  a 
method  which  can  be  adapted  to  a  wide  variety  of  problems. 


The  assumptions  are  written  such  that  (Al)  -  (A4)  can  bo  used  for  both 

bounded  and  unbounded  {£  }.  With  bounded  {£  },  we  can  let  the  u.  =  K, 

n  n  in 

all  i,n.  The  (A6)  and  (A7)  would  not  often  hold  as  stated  when  { Cn )  is 
unbounded.  The  forms  of  (A6)  and  (A7)  which  are  useful  for  the  unbounded  noise 
case  depend  on  the  particular  form  of  h(*,*)  and  it  does  not  seem  reasonable 
to  try  to  get  the  most  general  form  here.  The  unbounded  case  will  be  dis¬ 
cussed  after  the  theorem,  and  in  the  examples.  Owing  to  our  interest  to  have 
a  proof  which  can  be  used  with  only  minor  changes  when  {£n}  is  unbounded, 
the  details  are  a  little  more  complicated  than  necessary. 

2 

Al.  \sT  <  <*>,  Ta  =  a  >0,  {a  ,/a  }  is  bounded.  h(*,*)  is 
L  n  L  n  n  n+1  n 

measurable  and  for  each  compact  set  Q  there  is  a  sequence  {a()n  1  such  that 
|h(x,£n)|  *  aQn  for  x  e  Q  and  lEo^a*  < 

A2.  There  is  a  twice  continuously  differentiable  Liapunov  function 
V( •)  i  0  such  that  | C • )  |  is  bounded,  V(x)  as  |x|  >  ».  There  are 

cQ  >  0  and  <  »  such  that  for  x  i  =  (x  :  V(x)  $  AQ>,  we  have 
V^(x)h(x)  $  -Eg,  where  h(*)  is  defined  in  (A3) . 

A3.  Let  {a,  }  denote  a  sequence  of  random  variables  such  that  J  F.«‘  <  ®. 

In  in 

n 

There  is  a  continuously  differentiable  h(*)  such  that  the  limit  defined 
(pointwise  in  x)  by 

00 

VQ(x,n}  =  l  «iV*(x)E  [h(x,c.)  -  h(x)] 
j=n  J 

exists  and  (together  with  the  partial  sums)  is  bounded  by  +  | V^(x)h(x)  |) . 

The  bound  also  holds  if  V( • )  is  replaced  by  a  continuously  differentiable 
function  with  compact  support  ({a^}  <?ar»  depend  on  the  function). 

A4.  There  is  a  random  sequence  eua*t 


En  Jhfx,  *  4,(1  ♦  |v;(x)h(x)D 


4 


2 

and  a  a_  0  w.p.l  as  n  -*■  ®. 
n  2n  r 

A5.  |\M  (x)h(x) |  S  K(l  +  V(x)) . 

A6.  With  [  ]  denoting  gradient  with  respect  to  x ,  let 
00 

II  a  .[V'(X)(E  h(x,CJ  -h(x)}]x|  *  Kan  (l  +  |  V'(x)h(x)  | 1/2) 

j=n  J  J 

The  inequality  holds  with  V(*)  replaced  by  an  arbitrary  continuously 
differentiable  function  with  compact  support. 

A7.  For  s  s  1, 

Enlv^(x  +  sanh(x,Cn))fi(x  + sanh(x,?n))  I  $  K(l  ♦  | V^(x)h(x)  | ) 

Remark ■  As  seen  from  the  Examples  in  Section  5,  the  assumptions  include 
some  hard  and  interesting  cases.  In  a  sense,  the  'prototype'  model  for  (Al)  - 
(A7)  is  the  case  where  | h ( - , • ) ]  and  |h(»)l  have  at  most  a  linear  growth  in  |x 
as  | x ]  «  and  V(*)  has  a  growth  one  order  higher  than  that  of  h(‘,')  and 

h(‘).  Then  the  bounds  in  the  assumptions  make  sense,  under  various  mixing  type 
conditions  on  {£n>.  Here,  {£n>  is  not  treated  as  being  explicitly  'state 
dependent.'  In  the  state  dependent  case,  we  must  take  into  account  the  way 
that  {£n>  evolves  as  a  function  of  {Xn>,  and  use  a  slightly  different  form 
of  VQ(x,n).  See  Section  4  and  Example  2. 

Theorem  1.  Assume  (Al)  -  (AT).  The  sequence  {X^}  is  bounded  w.p.l. 

If  \H(x)fi(x)  s  0  for  all  x,  then  -*■  =  (x  :  V(x)h(x)  $  0)  w.p.l. 

Otherwise,  (Xn>  converges  w.p.l  to  the  largest  bounded  invariant  set of 

*Let  S  denote  a  bounded  invariant  set  of  (2.1).  Then  for  each  xc  S,  there 
is  a  trajectory  x(*)  of  (2. l)  contained  in  5  for  te  (-•»•)  and  x(0)  *x- 
The  invariant  set  is  the  set  of  all  limit  points  of  bounded  trajectories 
(on  to,*))  of  (2.1). 
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(2.1)  x  =  fi(x)  . 

Remark .  If  \M(x)fi(x)  is  not  $  0  for  all  x  or  if  AQ  contains  more 

than  one  point,  then  the  limit  set  of  (Xn)  might  be  more  than  one  point,  or 

even  be  a  non-degenerate  trajectory  of  (2.1).  Such  possibilities  do  exist  in 

applications.  But  the  theorem  can  be  refined  as  follows.  Let  x()  =  x(t)  be 

an  asymptotically  stable  solution  of  (2.1)  (in  the  sense  of  Liapunov)  with 

domain  of  attraction  DA(xQ) .  There  is  a  null  set  SI  such  that  if  u>  {  SI 

and  X  (u)  e  compact  A  c  DA(xn)  infinitely  often,  then  X  (u)  x„.  The 

n  u  n  u 

proof  follows  from  the  techniques  of  the  proof  below  and  the  proof  of 
Theorem  2.3.1  of  [6] . 


which  we  rewrite  as 
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<2-4>  -  vw!E„h<vv *  vyivv 

1 

Ids  J,  *jiE„.iv;(vs5v(h(vssx„'£j>-fi(xn*ssv)i,  ■ 

o  J=n+1 

By  (A6)  and  (A7)  and  an  application  of  Schwarz's  inequality,  the  last  term  of 


(2.4)  is  bounded  by 

(2.5) 


Ka^fl  +  I V*  (X  )h(X  )  I) 
n v  1  x  n  n' 1 ' 


Define  the  'averaged1  Liapunov  function  V(n)  =  V(X^)  ♦  V^fX^.n)  and 


note  that  by  (A3) , 


(2.6a) 


iw>i  s  - 


(2.6b) 


V(n)  ?  -  0(oln)  , 


where  ■*  0  w.p.l  as  n  -*•  ®.  Combining  (2.2)  -  (2.5), 

(2.7)  EnV(n+l)  -  V(n)  =  an(1+6n} Vi(Xn)fi(Xn)  +  Vn  ’ 


where  6  and  6  go  to  zero  w.p.l  as  n  -*■  00 . 
n  n  r 

Define  by 

n-1  n-1 

(2.8)  V(n)  -  l  a. (1+6 -)V*(X  )h(X  )  -  l  6a 

•  n  1  X  A  i-  X  •  _a  X  X 


n-1 

1  m.  =  M 
i=0  1  n 


By  (2.7),  (Mn)  is  a  martingale.  By  modifying  ^n»^n’^n’^n^  °n  a  Sct 
arbitrarily  small  probability,  we  can  suppose  that  there  is  an  WQ  <  •  such 
that  |5i|  <:  eQ/4  ,  |6^|  s  1/4  for  i  Z  Nq.  This  modification  will  not  alter 
the  conclusions. 

Let  n^  be  a  stopping  time  5  N ^  and  such  that  X^  i  Q0  (with  n^ 
equal  to  »  if  Xn  e  Qq,  all  n  *  NQ) .  Define  nj  =  mintn  :  n>nQ,  Xn  c  Qq). 
Then  by  (2.6),  (2.7)  the  sequence  {Vfnnnj),  n  z  n^)  is  a  super  martingale 
which  is  bounded  below  by  -0(a^n) .  The  facts  that  EnV(n+l)  -V(n)  s  -e()an/2 
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for  Xn  /  Qq  and  n  large,  and  £a^  =  <*>  imply  that  X^  £  infinitely  often 
w.p.l.  Define  Q1  =  {x  :  V(x)  $  }  where  Aj  >  AQ.  Let  Qj  be  the  Q  of 

(Al). 


By  a  modification  of  the  paths  on  a  set  of  arbitrarily  small  measure,  we 
can  suppose  that  | a^h (x , €n) l  $  anaon  *  *  for  large  n  and  x  c  Q  (say, 
for  convenience,  n  z  Nq) .  This  modification  will  not  affect  the  conclusions. 
By  (Al)  and  (A3),  there  are  real  K^(Q1)  such  that  if  X^  <;  Qj , 


IW»)|2  *  IVXn.l’"*1)|2  *  Wan  *  lv(xn.l>  'V(V 


(2.9) 


*  MQ.)[a2  +  a2  +  a2  .  +  a2 lh(X  )|2) 
2  X1  1  n  In  l,n+l  n(  n  ’  ir  '  J 


K3^1Jlan 


W*n 


Let  n~  be  any  stopping  time  such  that  X  t  Q..  Let  n,  =  min(n  :  X  i  Q, ,  n*n  }. 
w  a  j  n  A  L. 

Then  by  (2.9)  (note  that  the  right  side  is  summable  over  all  return  periods  in  Qj), 

n  n  -1 

n  3  y 

(2.10)  P(  sup  |  l  m  |  *  e}  «  K  (Q  )E  £  g Jt  . 

Vn<n3  i=n2  i=n2 

We  conclude  from  (2.8)  and  (2.10)  and  the  recurrence  of  and  the  facts  that 

a  h(X  ,C  )  -*■  0  w.p.l  for  X  e  Q1  and  V’(x)h(x)  *  -en  for  x  J  Qn  that 
n  ii  h  x  3c  v/  A3 

eventually  remains  in  for  any  A^  >  A^.  Hence  converges  w.p.l. 

Furthermore,  since  Mr  converges  w.p.l  and  V^X^.n)  =  0(aJn)  0 
w.p.l  (since  X^  eventually  remains  in  the  bounded  set  Qj), 

ra-1 

(2.11)  sup  |v(Xm)  -  V(X  )  -  l  a.V^(X  )h(X.)|  +  0 

man  i=n 

w.p.l  as  n  ■+•  ».  If  V^(x)fi(x)  $  0  all  x,  then  (2.11)  implies  that 

Xn  ->  (x  :  V^(x)h(x)  =  0). 

a2 

Next,  fix  f(*)  e  Cq,  and  repeat  the  development  with  f(»)  replacing 
V( *) .  This  yields 
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m-1 

sup  |f(Xm)  -  f(Xn)  -  l  a.f^X.JhCX.jj  -  0 
m?n  i=n 

w.p.l  as  n  -*•  ®.  In  particular,  since  for  large  n.  e  ,  we  have 

m-1 

(2.12)  sup  |  X  -  X  -  £  a .  fi(X.)  |  0  w.p.l  as  n  -►  ®  . 

min  i=n 

Equation  (2.12)  implies  that  (w.p.l)  the  limit  points  arc  contained  in  the  set 
of  limit  points  of  the  bounded  trajectories  of  (2.1).  Q.E.D. 

Remarks  on  the  unbounded  noise  case.  (A6)  and  (A7),  which  were  used  only 
to  get  the  bound  (2.5),  would  not  hold  very  often  if  (5^)  were  unbounded. 

If  (2.5)  were  replaced  by  a^a3n(l+| V^(Xn)h(Xn) |)  where  a«fEa^a~n <  ®,  and  if 
this  inequality  holds  with  an  arbitrary  continuously  differentiable  f(')  with 
compact  support  replacing  V(*)»  then  the  proof  goes  through  with  only  minor 
changes.  The  (a^)  can  depend  on  f(*)- 

3.  THE  PROJECTION  METHOD 

Recall  the  definition  of  G  and  tt^,  from  Section  1.  Let  rr(h(*)) 
denote  the  (not  necessarily  unique)  projection  of  the  vector  field  h(*)  onto 
G;  i.e., 

"(h(x))  =  lim  [it  (x  +  Ah(x))  -  x)/A 

We  will  use 

A8.  The  qi(*)i  i*lf  . m,  are  continuously  differentiable ,  G  is 
bounded ,  and  is  the  closure  of  its  interior  G°  =  G  -  3G  = 

(x  :  qi(x)  <0,  i = 1,  . . . ,  m). 


In  lieu  of  (A6),  we  use  the  weaker  (’unbounded'  noise)  condition: 
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I  A9.  Let  t°2n^  be  (see  (A4))  a  random  sequence  satisfying 

12  2 

En|h(x,5n)|  *  ®2n  for  x  e  G.  Let  {a^}  be  a  random  sequence  satisfying 

|  (for  x  c  G) 

!  oo 

S'  II  a.[E  h(x,£.)  -  h(x)]  |  s  a 

1  >  i 1  n  r  Jx'  4n 

3=n  J 

I  1/2  2 

|  ‘  and  let  a2n^n  a4  n+^  "*  ®  an<^  a2nan^^  W-P-J  os  n  -*■  ». 

Remark .  The  assumption  that  G  is  the  closure  of  its  interior  is  useful 

in  visualizing  constructions  in  the  proof,  and  slightly  simplifies  the  details. 

The  theorem  remains  valid  when  there  are  only  equality  constraints.  Then,  of 

course,  {X  }  moves  on  the  constraint  surface.  If  (£.)  is  bounded  and 
n  J 

satisfies  a  sufficiently  strong  mixing  condition,  then  the  in  (A9)  is 

0(an)  and  the  last  requirement  of  (A9)  holds  by  (A4).  Condition  (A9)  is  used 
only  to  show  that  (3.4)  is  of  the  order  given  below  (3.4). 


Theorem  2.  Assume  (Al),  (A3)  (with  the  V  term  dropped  and  for  x  t  G) , 
(A8) ,  (A9) .  Then  (w.p.l)  the  limit  points  of  { }  are  those  of  the  'projected' 
ODE 

(3.1)  x  =  w(h(x)J 

Let  H( •)  i  0  be  a  real  valued  function  with  continuous  first  and  second 
partial  derivatives  and  define  h(*)  =  -Hx(*).  Then ,  as  n  -*•  «,  (Xn)  conver¬ 
ges  w.p.l  to  the  set  KT  =  (x  :  h'(x)w(h(x))  =  0). 

Remarks .  The  last  remark  after  the  statement  of  Theorem  1  also  holds  here. 
The  form  fi(»)  *  -Hx(»)  arises  in  the  projected  form  of  the  Kiefer-Wolfowitz 
procedure,  where  we  seek  to  minimize  the  regression  H(*)»  subject  to  x  t  G. 
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Proof.  Except  for  the  treatment  of  certain  projection  terms,  the  proof  is 
quite  similar  to  that  of  Theorem  1.  Since  G  is  bounded,  the  Liapunov  func¬ 
tion  V(*)  is  not  required  and  (A3,  A6)  and  extensions  need  only  be  applied 
with  f(*)  an  arbitrary  real  valued  function  with  continuous  second  partial 
derivative,  in  order  to  characterize  the  limit  points,  similarly  to  what  was 
done  in  Theorem  1 . 

We  have 


(3.2)  E  f(X  ,)  -  f(X  )  =  a  f’(X  )E  h(X  ,£  )  ♦  a  f'(X  )E  t 

n  v  n+r  v  n'  n  x  n'  n  1  n’  n  n  ny  n  n 


l 

♦  af  I  E  (6X  /a  )'  f  (X  ♦  s6X  )(6X  /a  )(l-s)ds  , 
n  J  nv  n  n'  xx  n  n'  *■  n  n 


where  xn  is  the  'projection  error': 


Tn  ■  l"c(X„'anh‘X„-«„»  '  (X„‘a„h<x„-V)l'“n  ' 

If  X  +  a  h(X  ,E  )  i  G,  but  there  is  a  unique  i  such  that 
n  n  n  n 

X  .  £  3G.  =  (x  :  q. (x)  =  0} ,  then  t  points  'inward'  at  X  ,  and,  in  fact, 
n+i  ii  n  n+i 

t  =  -A.q.  ( X  . )  for  some  A.  i  0.  In  general,  suppose  that  X  +a  h(X  ,£  )  i  G, 

n  x  i,x  n+i  l  nnnn 

but  (with  a  reordering  of  the  indices,  if  necessary)  Xn+J  is  in  the  intersection 

i 

II  3G..  Then  for  each  y  for  which  q.'  (X  .)y  $  0,  each  i  f  l,  we  must  have 
j  i  i >  x  i 

T^y  i  0.  (Otherwise  xn  would  not  be  the 'projection  error' or,  equivalently, 

X  ,  would  not  be  the  closest  point  on  3G  to  X  +  a  h(X  ,£  )  t  G.)  Thus, 
n+1  - *■ -  n  n  v  n  n 

by  Farkas'  Lemma,  t  must  lie  in  the  cone  -C(X  ,),  where 

n  v  n+1 


C(x)  *  (y  :  y  *  l  A  q  (x) }  , 

icA(x)  1  1,x 

where  A(x)  is  the  set  of  constraints  which  are  active  at  x. 

Note  also  that  since  6Xn  ■+  0  w.p.l  (by  (Al))f  there  is  a  real  sequence 
0  <  un  "*■  ®  such  that  tn  ■  0  for  large  n  (w.p.l)  if  distance  (Xn,3G)  *  un, 


and  a  t  and  aET  +  0  w.p.l.  These  facts  will  be  used  when  characterizing 
n  n  n  n  n  r 

the  limit  points  below. 

Now,  following  the  argument  in  Theorem  1  but  for  smooth  f(*)  replacing 
V( •) ,  define 

00 

f  (x,n)  =  l  a.r(x)En[h(x,£.)  -  h(x)] 

j=n  3 

and  define  f(n)  =  f(Xn)  ♦  fg(Xn>n).  Analogous  to  the  result  in  Theorem  1, 
there  is  a  random  sequence  6  -*■  0  w.p.l  such  that 

(3.3)  Enf(n*l)  -  f(n)  -  6^  -  aRf • (Xn)h(Xn)  -  anf-(Xn)Enrn  =  0  . 

We  do  not  need  to  average  out  the  E  t  term.  [In  order  to  get  (3.3),  via  the 

n  n 

method  of  Theorem  1,  we  must  show  that  the  difference 

(3'4)  E"  ,L  fi(W  Wb'Vl’V  -  fi<W  -  jX,  WEn|h<  W  •  fi(Xn>l 

■  EnEXn  )  ds  .  I  ,  *j'fl(Vs<VEn.l(h(Vs5Xn'«j»  •fi(VSSX,.),1x 
0  3_n+1 

is  of  an  order  a  ar  where  -*•  0  w.p.l  as  n  ->•  °°.  But  by  (Al),  (A9) ,  and 

n  5n  5n 

1/2  2 

an  application  of  Schwarz's  inequality,  we  have  =  «2n  n  a4  n+1'^ 

We  also  have 

(3.5)  f(n)  -  f(0)  -  Y  .  -  Y  a^fx.),.  -  Y  .  i  a  J  a  M,  . 

x*0  1*0  1=0  1=0 

+  9 

where  (Mn)  is  a  martingale  andT  E  £  mf  <  =».  Finally,  letting  f(*)  equal  an 
arbitrary  coordinate  variable  in  G,  and  using  the  above  square  integrabi 1 ity 
and  the  fact  that  fQ(Xn,n)  -*•  0  w.p.l,  we  get 


m-1  m-1 

(3.6)  sup  |X  -X  -  l  a.fi (X.)  -  I  a.t.|  -  0  w.p.l 


as  n  •  . 


+To  get  the  inequality,  we  might  have  to  alter  (Xn,E  }  on  •  set  of  arbitrarily 
small  probability,  but  as  in  Theorem  1,  this  doesnot  alter  the  conclusions. 


m 


WWRI 


By  the  properties  of  the  projection  terms'  {a^},  and  the  fact  that  the 
'limit  dynamics'  implied  by  (3.6)  is  that  of  the  'projected'  ODE  (3.1),  (3.6) 
implies  that  (w.p.l)  all  limit  points  of  (X^)  must  be  limit  points  of  (3.1). 

The  £  aiTi  term  simply  compensates  for  the  part  of  £ajh(X^)  which  would 
take  the  trajectory  out  of  G. 

Now,  let  h(*)  =  -Hx(*),  and  use  H(*)  as  a  Liapunov  function.  Then 
(3.7)  H(x)  =  Hx(x)ff(-Hx(x))  $  0  . 

Equation  (3.9)  implies  that  the  limit  points  of  (3.1)  are  contained  in  KT.  Q.E.L). 

4.  STATE  DEPENDENT  NOISE 

It  is  often  necessary  to  take  explicit  account  of  the  way  that  the  evolution 
of  (£j,  j  in), depends  on  (X^,  j  $n).  We  might  use  a  parametrization  of  the  type 

£n  *  8n^n-i*  Xn*  Xn-1'  Xn-k’  ^  ’  where  is  an  "exogenous'  sequence. 

Such  a  scheme  was  used  in  [5]  and  [6],  where  the  g^  were  assumed  to  be  suffi¬ 
ciently  smooth  functions  of  the  Xn,  X  j,  ...  .  In  the  development  of  this 
section,  we  suppose  that  {X^,?^  n*l)  is  a  Markov  process  (not  necessarily 
stationary).  In  fitting  this  format  to  particular  applications,  it  might  be 
required  to  'Markovianize'  the  original  (state,  noise)  process.  Let  E  denote 
conditioning  an  (£j,  j<n,  X^ ,  jsn).  Define  the  'partial'  transition  func¬ 
tion  as  follows.  Define 

P(£,n,r,n+1  J x)  =  P(Cn+1erUn  =  5,  Vl  =  x}  • 

In  general,  define  P(?,n, r,n+a|x)  by  the  convolution 

P(£,n,r,n+a+B|x)  = 


Jp(£,n,dy,n+a|  x)P(y,n*a,r  ,n+a«-6|x) 


13 


Thus  in  calculating  the  above  transition  function,  is  held  fixed  at  x 

for  j  $  n+a  +  0.  This  partial  transition  function  is  useful  because,  loosely 

speaking,  (5  )  varies  much  faster  than  {X  }  does.  Vn(x,n)  is  now  written 
n  n  u 

in  the  form 

(4.1)  V  (x,n)  =  l  a.V^(x)(fh(x,C)P(Cn  j.n-l.dC.j |x)  -  h(x)l 

j=n  J  1 

Define  V(n)  *  Vp(Xn,n)  ♦  V(Xn) .  Note  the  way  the  averaging  is  done  in  (4.1) 
compared  to  how  it  was  done  in  the  V^(x,n)  of  (A3).  The  integral  in  (4.1) 
could  be  written  as  E[h(x,5j  (x)}|  5n  l(x)  =  Cn  where  for  each  x,n,  (Cj(x). 
j^n-lj  is  a  process  which  evolves  according  to  the  lsW  P(£,a,r,d|x) ,  where 
n-1  and  Cn  j(x)  =  £n  j.  See  [12]  for  other  applications  of  this  idea. 
Suppose  that  the  sum  in  (4.1)  is  continuously  differentiable  in  x,  and 
that  the  derivatives  can  be  taken  termwise.  Then  (4.2)  replaces  the  sum  in  (A6) 

(4.2)  l  a. [V'(x){  [h(x,C)P(C  ..n-l.dC.j |x)  -  h(x))]x  . 

j*n  J  } 

Theorem  3.  Assume  (Al)  -  (A7)  but  with  the  above  cited  replacements  (4.1), 

(4.2) .  Then  the  conclusions  of  Theorem  1  hold.  The  extensions  to  the  unbounded 
noise  case  stated  in  the  remark  after  Theorem  1  also  hold  here.  Under  the  con¬ 
ditions  of  Theorem  2,  subject  to  the  above  replacements,  the  conclusions  of 
Theorem  2  hold. 

The  proof  is  almost  identical  to  that  of  Theorem  1  and  (where  appropriate) 
Theorem  2.  We  note  only  the  following.  By  the  Markov  property, 

(4.3)  EnPUn,n,r,j|Xn)  «  p(C„.1»n-1»r,j|XI1)  ,  j*n. 

Note  that  the  lowest  term  in  (4.6)  is  (with  x  ■  X  )  a  V'(X  ) [E  h(X  )-d(X  )J 

X]  II  A  It  II  II  II  II 

exactly  as  in  th«  sum  in  (A3).  In  the  proof  we  get  (4.4),  the  analog  of  the 
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first  two  terms  on  the  right  of  (2.3). 

<4-«>  .  I  ,  EnajV;(X„.l){/hCXn..-«)P(5n’n-dJ’j|Xn.1)  '  fi<W> 

j=n+l  J 

00 

-  I  EnaiVx(Xn){^h{Xn*5)P(Cn-l*n'1'dCjiXn)  'fi(Xn)} 
j=n+l  J 

The  left  side  of  (4.3)  replaces  the  right  side  of  (4.3)  in  the  second  sum  of 

(4.4) .  Then  the  differential ity  assumption,  and  the  bounds  in  (Al)  -  (A?), 
yield  a  bound  analogous  to  the  one  obtained  for  the  sums  on  the  right  of  (2.3) 

Remark.  All  the  foregoing  results  hold  if  { }  is  random,  under  the 

following  additional  conditions,  a^  depends  on  (X^,  i  $  n)  only, 

2 

y  a  =  “>.  y  a  <  ®,  T  la  ,-a  I  <  »  w.p.l  and  with 

L  n  *  L  n  L  1  n+1  n1  r 

n  n  n 

00 

(4.4)  VQ(x,n)  =  an  £  E  V*  (x)  [h(x,€.)  -  h(x)  ] 

j=n 

replacing  the  VQ(x,n)  of  (A3). 

n-1 

Remarks  on  weak  convergence.  Define  t  =  a.  ,  m^  =  minfn  :  tn*t) 
and  X°(t)  =  Xn  on  [tn»tn+1)  and  Xn(t)  =  X(t*tn)  for  t  * -tn  and 
Xn(t)  =  XQ  for  t«  -tn.  Then  the  previous  theorems  imply  various  strong  con¬ 
vergence  properties  for  (Xn(*)>.  E.g.,  the  proof  of  Theorem  1  implies  that 

{ Xn ( • ) )  converges  uniformly  on  finite  time  intervals  to  a  solution  of  (2.1) 
and  that  the  limit  path  is  contained  in  the  invariant  set  S  cited  there. 
Under  weaker  conditions,  (Xn(*))  possesses  various  weak  convergence  (in 
Dr[().“))  properties.  Here,  we  only  cite  a  result;  the  proof  is  quite  similar 
to  that  of  Theorem  8  in  115]. 
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Assume  the  conditions  of  Theorem  1,  except  replace  £(1  +  Eal)n)a“  <  <«  l>y 

a  -*■  0  and  an  a  -»•  0  w.p.l.  Weaken  (A3)  to  require  onlv  a.  0  w.p.l 
n  Onn  v  ^  -In  1 

and  Eajn  -+  0.  Then  (Xn(*)}  is  tight,  and  all  limit  paths  are  in  S.  Also 
X  -*•  S  in  probability.  More  strongly,  for  each  T  <  “,  c  >  0, 

lim  P(  sup  distance  (xn(t),s)  >,  e)  =  0  . 
n-+°°  1 1 1  $T 

t»e  OPr></<  Cl  r 

y  2  2  n  1  ■> 

If  Ea  a,  -*•  0  and  a  a,  -*•  0  w.p.l  replace  )Ea“a“  <  then  the  above 
n  3n  n  3n  r  r  L  n  3n 

result  also  holds  for  the  'unbounded  noise*  case.  There  are  similar  weak 
convergence  versions  of  Theorems  2  and  3. 


S.  EXAMPLES 

5a.  Example  1.  Convergence  of  an  adaptive  quantizer.  Frequently  in  telecommu¬ 
nications  systems,  the  signal  is  quantized  and  only  the  quantized  form  is  trans¬ 
mitted,  in  order  to  use  the  communications  channel  as  efficiently  as  possible. 

It  is  desirable  to  adapt  the  quantizer  to  the  particular  signal  [13,14],  in 
order  to  maximize  the  quality  of  the  received  signal.  Here  a  stochastic  approx¬ 
imation  form  of  an  adaptive  quantizer  will  be  studied.  Let  C C  * )  denote  the 
original  stationary  signal  process  and  A  a  sampling  interval.  Write  f,(nA)  *  Cn- 
The  signal  £(*)  is  sampled  at  instants  (nA,  n  =  0,  1,  ...,}  ,  a  quant i rat  ion 
QUn)  calculated,  and  only  this  quantization  is  transmitted. 

The  quantizer  is  defined  as  follows.  Let  L  denote  an  integer,  x  a  para¬ 
meter,  and  (p real  numbers  such  that  0=p0<p1<...<p^  j  <  Pj  =  ■», 

0  *  nl  <  n2  <  <  nL*  *n  >  define  Q(£n)  =  xni  5n  f  l  4  j .  xo  t ) , 

and  set  Q(z)  *  Q(-z).  In  order  to  maintain  the  fidelity  of  the  signal  which 

is  reconstructed  from  the  sequence  of  received  quantizations,  the  scaling 
parameter  x  should  increase  as  the  signal  power  increases. 
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Let  8  c  (0,1]  and  let  0  <  Mj  <  ...  <  <  °°  with  Mj  <  1 ,  Mj  >  I .  A 

typical  adaptive  quantizer  (adapting  the  scale  x)  is  defined  by  (.r>.l),  where 
X^  is  the  scale  value  at  the  nth  sampling  instant. 


‘s-l>  Vi '  XS  \ 


B  =  M. 
n  l 


for 


Vi-1  5  'V 


<  X  p. 

n  i 


We  will  analyze  a  stochastic  approximation  version  of  (5.1).  Let  a  >  0  be 

such  that  a  a  <  1,  and  let  {t,.}  be  real  numbers  such  that  i,  <0,  i,  >  0, 
n  l  li. 

and  l,  <&_<...<  I,  .  Let  0  <  x.  <  x  <  <*>.  Then  we  use 
12  L  S.  u 


(5.2) 


(i-a  a) 

X,  =  X  (1  +  a  b  ) 

n+1  n  n  n 


where  b  =  l.  if  X  p.  .  j  If  I  <  X  p..  and  the  bar  I  denotes  truncation, 
n  i  n  l-l  ,sn'  n  i’  1 

With  a  >  0,  the  algorithm  has  some  desirable  robustness  properties.  The 

1  2 

algorithm  (5.2)  can  be  rewritten  as  (use  y  E  =  y(l-e  log  y)  ♦  0(e  )  ) 


(5.3) 

where 


X  ,  =  [X  *ah(X  )  ♦  0(a  j  ] 

n+1  1  n  n  v  n  n  v  n' J 


h(X  ,C  )  =  X  b  -  0X  log  X 

n  n  n  n  n  *  n 


L 

lc„l  <  wtl 

h(x,£)  = 

-ax  log  x  ♦  x  l 
i  =  l 

L 

,iIlx,>i-i * 

h(x)  = 

-ax  log  x  +  x  l 
i=l 

iip{xpi-i  s 

U„l  <  XPj) 

For  specificity,  let  £(•)  be  a  stationary  Gaussian  process.  In  particu¬ 
lar  for  a  matrix  M  whose  eigenvalues  have  negative  re..l  parts  and  a  standard 

Wiener  process  w(«)»  define  v(*),  C(*)  by  dv  =  Mv  dt  ♦  Cdw  ,  C  c  Nv  . 

2 

Let  oQ  =  var  £(t).  Suppose  that  Cov  v(t)  =  S  >  0.  We  have 
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(S 


.5)  -jLfSUl 


dx  {  x 


7  7 

pTx" 


pi-ix 


✓57 


l  £.(Pi  exp  -  -4-  -  p  exp  -  ~~^~2  1 

1=1  2o„  2on 


*/x 


L-l 


I  (^i  -  ^i+pPi  cxp  -  pix2/2°o  '  a/x  • 


✓57  0^  i=l 


Thus  h(x)/x  is  the  sum  of  two  strictly  convex  functions,  the  first  being 

bounded  and  having  a  negative  slope  and  the  second  going  to  ®  as  x  ►  0  and 

to  -<*>  as  x  -*■  00 .  Thus  there  is  a  unique  x  e  (0,“)  such  that  h(x)  =  0. 

Also  h(x)  >  0  for  x  <  x  and  h(x)  <  0  for  x  >  x.  We  use  the  'unbounded' 

noise  version  of  Theorem  2.  See  the  remark  after  that  theorem  statement, 
v  2 

Let  2,  an  <  “•  Since  h( • ,  •)  is  bounded  (for  x  e  [x  , x  ] )  .  we  need 

only  verify  (A3)  for  f(-)  e  and  (as  noted  in  the  remark  after  the  proof  of 

Theorem  1)  get  the  appropriate  bound  for  the  second  term  of  (2.4),  with 
~2 

f(’)  e  Cq  replacing  V(-).  Let  En  denote  conditioning  on  v(iA),  j  <  n. 

It  can  be  verified  that  (the  rate  of  convergence  of  the  sum  depends  on 
v(nA-A)) 

00 

t5-6)  I  Enlh(x,5.)  -  h(x)]  $  a  K(|v(nA-A) |  ♦  l)  . 

j  =n  J  n 

The  right-hand  side  of  (5.6)  goes  to  zero  w.p.l  as  n  •*  «  and  (A3)  holds. 

Next,  using  the  fact  that  (j  5  n)  P{xp.  ^  ?  |$  |  <  xp.|v(nA-A)l  is  a  smooth 
and  bounded  function  of  x,  and  it  and  its  x-derivative  converge 

(fast  enough)  to  the  unconditional  pnobabtUty  and  its  x-derivative  as  J  -►  •,  we 
can  get  a  bound  of  the  form  of  the  right-hand  side  of  (5.6)  on  the  last  term 

of  (2.4)  (using  f(-)  e  Cjj  in  lieu  of  V(«)).  Thus  ail  the  conditions  of  the 
projection  Theorem  2  hold.  Hence  if  x  c  [x.,x  ],  then  X  +  x  w.p.l- 
otherwise  Xn  converges  w.p.l  to  the  endpoint  nearest  to  x. 
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5b.  A  Kiefer-Wolfowitz  Procedure  with  Observation  Averaging.  Let  f(*)  bo  a 

real  valued  function  on  R*,  whose  first  and  second  derivatives  are  bounded  on  R1 

Suppose  that  there  is  a  unique  0  (let  9  =  0)  such  that  f^(9)  =  0  and  let 

there  be  suc^  *****  fx(x)  *  el  ^or  x  *  ~e2’  an<*  *  "‘"1’  *or 

x  i  e^.  Let  {a^jC^}  be  sequences  of  positive  real  numbers  tending  to  zero 

and  such  that  I  a  =«,  £  a  c  <°»,  £  a2/c4  <  «,  lim  sup  a. /a  <  <», 

n  n  n  L  n  n  n  j>n  .)  » 

lim  sup  c./c  <  ®.  Let  (ip.)  be  a  sequence  of  mutually  independent  mean  zero 
n  j  an  J  n  1 

random  variables  with  variances  bounded  by  o'  <  ®,  and  4th  moment  by  m^  <  ®. 

Let  0  <  a  <  1 ,  g  >  0  and  define  (X  ,  £  )  by 

n  n 


X  +  a  £ 
n  n  n 


nil 


(5.7) 


~f(  XnV  -  W 

2cn 


=  a£  ,  -  fif  (X  )  +  0  -  64>  /c 

n-1  y^  n  n  n  n 


where  |0  I  $  Kc  . 

n 1  n 

With  a  =  0,  we  have  a  form  of  the  Kiefer-Wolfowitz  (KW)  process.  Then 
2  2  2  A 

y  aVc  <  ®  can  replace  J  a  /c  <  ®.  With  0  <  a  <  1,  the  observations  are 
*■  n  n  u  n  n 

averaged  with  exponentially  decreasing  weights.  The  conditions  on  i >  and 
on  f(*)  can  be  relaxed,  but  the  technique  will  be  well  illustrated  with  the 
given  conditions.  Here  h(x,£)  =  £,  which  is  not  a  priori  bounded,  and  in  fact 
is  'state'  dependent.  We  show  that  X  -►  0  w.p.l,  via  Theorem  1  (extension 
for  unbounded  noise).  Define  h(x)  =  -6fx(x)/(l-a) .  For  notational  convenience 
we  drop  the  0^  in  (5.7).  It  can  readily  be  carried  through  with  little  addi¬ 
tional  difficulty.  Define  V(x)  =  x2. 

Conditions  (A2)  and  (AS)  obviously  hold.  Define  {£  ,£  )  by 

n  n 

ln  “  a*n-l  •  8fx(V’  *n  “  °*n-l  ‘  0W  Clear,y  {V  is  uniformly 
„  n 

bounded  and  £^  -  0  [  an_1  ^/c^  Now, 
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and 


2  _  ;2  „  .2  2  r  2n-2i  2.  2 
a  E  5  s  8  a  /  a  a/c. 

n  n  l 

1=1 


r  2  A  2 

)  a  t  ET  <  00  • 
L  n  n 


Thus  (Al)  holds  and  a  £  -*-0  w.p.l.  Now  check  (A4).  We  have 

n  n 

_  2(  r  n-i  .  ,  %4  <  2  r  2 (n-i)  2(n-j)  4.22 

(5.8)  Ea  (  l  a  *./c.J  =  a  £  a  Ja  /c  c 

i=l  i,j=l 


n 

l 

i=l 


2  r  4 (n-i)  ,  4 


♦  aM  « 

n  -L 


m./c.  =  y 
4i  n 


n  A  2  A  2 

Since  )  y  <  «,  we  have  a  5  -*■  0  w.p.l  and  a  E  (  0  w.p.l 

L  n  n  n  r  n  n  n  1 

n 

holds . 


Thus  (A4 ) 


Define  {^(x),  j  in),  VQ(x,n),  VQ(x,n) ,  by 


l.M  ■  -  Bfx(x)  .  in(Xn)  »  tr 


V0(x,n) 


l  a.V  (x)[I. (x)  -  h ( x) J 
j=n  J  x  J 


V(x,n) 

Then  |V0(x,n)|  5:  Kan(l  +  |x|).  Also 


I  a,V  (x)E(.  . 

.L  j  x  n*j 
j=n 


E  i  =  cr’  +  1"n  £ 
n  J 


n- 1 


and 


V(x,n)  =  (  l  a  aj+1*n  Vv(x)H 

j=n  J 


x^^n-l  • 


These  representations  can  be  used  to  readily  show  that  both  (A3)  and  the  con¬ 
dition  Mentioned  for  the  unbounded  noise  case  after  the  proof  of  Theorem  1 
hold.  Thus,  by  (the  unbounded  noise  extension  of)  Theorem  1,  -»  0  w.p.l. 
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